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Abstract. It is a fundamental property of non-letter Lyndon words that they can be 
expressed as a concatenation of two shorter Lyndon words. This leads to a naive lower 
bound log2 (n) + 1 for the number of Lyndon factors that a Lyndon word of length n must 
have. But this bound is not optimal. In this paper we show that a much more accurate 
lower bound is log^(n), where = (1 + \/E)/2, the golden ratio. We show that this 
bound is optimal in that it is attained by the Fibonacci Lyndon words, and that they are the 
unique Lyndon words doing so. As an application, we introduce a mapping £x that counts 
the number of Lyndon factors of length at most n of an infinite word x. We show that a 
recurrent infinite word x is aperiodic if and only if Cx > C(, where f is the Fibonacci 
infinite word, with equality if and only if x is in the shift orbit closure of f . 

Keywords: Lyndon word, Fibonacci word. Central word. Golden ratio, Sturmian word. 
Periodicity 

1. Introduction 

Lyndon words are primitive words that are the lexicographically smallest words in their 
conjugacy classes [ ]. Originally defined in the context of free Lie algebras [6], Lyndon 
words have shown to be a useful tool for a variety of problems in combinatorics ranging 
from the construction of de Bruijn sequences [13] to proving the optimal lower bound for 
the size of uniform unavoidable sets [5]. One of the fundamental properties of Lyndon 
words is their recursive nature: if w is a non-letter Lyndon word, then there exist two 
shorter Lyndon words u and v such that w = uv. This implies that the number of different 
Lyndon factors of w is bounded below by logj |w| + 1, but a little experimentation quickly 
shows that this is hardly optimal. One of the results of this paper (Corollary 1) is that a 
much better lower bound is log^lit;!, where = (1 + a/5)/2 is the golden ratio. This 
bound is optimal in the sense that the Fibonacci Lyndon words attain it. In fact, they are 
the unique words doing so. More precisely (Theorem 1), we show that if w is a Lyndon 
word with \w\ > F„, where Fn is the n* Fibonacci number, then the number of Lyndon 
factors in w is at least n with equality if and only if w equals one of the two Fibonacci 
Lyndon words of length F„, up to renaming the letters. 

It also makes sense to count the number of Lyndon factors of infinite words, but there 
we have to use caution: if an infinite word is aperiodic, it will have infinitely many Lyndon 
factors (Corollary 2). Thus we define a mapping £x : ^ for which Cx{n) is the 
number of distinct Lyndon factors of length at most n of a given infinite word x. Of 
special importance is the Fibonacci infinite word f . Our first main result (Theorem 3) in 
this setting is that if x is aperiodic, then Cx > Cf. This is a significant improvement of a 
classic result by Ehrenfeucht and Silberger [ 1 2] who showed that an aperiodic infinite word 
must have arbitrarily long unbordered factors. Indeed, Lyndon words being unbordered, 
we can now say exactly how many unbordered factors of at most a given length there must 



Date: 17 July 2012. 



1 



2 



KALLE SAARI 



be. If we confine ourselves to recurrent infinite words, which is not a serious restriction, 
then the above result can be improved as follows (Theorem 4). A recurrent infinite word x 
is aperiodic if and only if > Cf with equality if and only if x is in the shift orbit closure 
of the Fibonacci word f , to renaming the letters. 

The Fibonacci word is a kind of a universal optimality prover in that it possesses a wide 
range of extremal properties, see e.g. [3, 7, 10, 19]. 

2. Preliminaries 

In this section we establish the notation of this paper and present some prehminary re- 
sults. We assume the reader is familiar with the usual terminology of words and languages 
as given in [1] or [18]. 

Let ^ be a finite, nonsingular alphabet totally ordered by <; thus every pair of distinct 
letters a, b G .4 satisfy either a < b or b < a, but not both. We use the same symbol '<' 
to denote the usual order relation among the integers, but this should not cause problems 
as the context always tells which order is meant. In what follows, we sometimes assume 
that 0, 1 G ^, sometimes a, b G .4, and then their mutual order is implicitly assumed to be 
their "natural order," so that < 1 and a < b. 

The set of all finite words over A is denoted by A* and the set of finite words excluding 
the empty word e is denoted by A'^. 

Let w = aia2 • • • a„ be a nonempty finite word with a; G A and n > 1. The length 
of w is \w\ = n. (We denote the cardinality of a set X by The reversal of w is the 

word = a„a„_i • • • ai. If = w, then w is a palindrome. The word w has period 
p > 1 if Oi+p = a,; for alH = 1,2, ... ,n — p. According to this definition, any p > n is a 
period of w. If p < n, then p is a period of w if and only if there exist words x,y,z G A* 
such that w = xy = zx and |y| = \z\ = p. If w has no periods smaller than \w\, then it 
is called unbordered, otherwise w is bordered. Suppose that w = uvt with u,v,t G A*. 
Then u, v, t are called a prefix, factor, and suffix of w, respectively. In addition, u and t 
are proper if they do not equal w. We say that a word z G A'^ is a periodic extension of 
u) if z is a prefix of a word in w'^. We abuse the word "extension" here in that we allow 
an "extension" to be a prefix of w. The word we get from w by deleting its last letter is 
denoted by w^; thus = 0102 • • • a„_i. Also if w = xy for some words x, y, we denote 
x~^w = y and wy~^ ~ x. The word w is primitive if it cannot be written in the form 
w — for a word u G A'^ and an integer fc > 2. If it; = uv, then the word vu is called a 
conjugate of w. The set of all conjugates of w is called the conjugacy class of w. 

Lemma 1 (Castelli, Mignosi, and Restivo [4]). Let w G A'^ be a word with periods p, q. 

(i) If g < p < then the prefix and suffix of w of length |w| — g have periods q and 
p-q. 

(ii) Let u and v be the prefix and suffix of w of length q, respectively. Then uw and wv 
have periods q and p + q. 

In the property (ii) in the previous lemma, the indicated source [4] only mentions and 
proves the claim for the periods of uw, but the case for the periods of wv can be proved 
similarly. 

Lemma 2 (Fine and Wilf [14]). If a word w G A'^ has periods p and q, and 

p + q- gcd{p,q) < \w\, 

then w has period gcd(p, q). 
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It is well-known that the above lemma is optimal. That is to say, if neither of p and q 
equals gcd(p, q), then there exists a word z E of length |z| = p + g — gcd(p, q) — 1 
with periods p and q, but no period gcd(p, q). Following [ 1 we call a word z e {a, b}* 
a central word over { a, b} if there exist two coprime integers p, q such that \ z\ = p + q — 2 
and both p and q are periods of z. Equivalently, z is a central word over {a, b} if either 
z S a* U b* or there exist two coprime integers p,q > 2 such that |z| = p + q — 2, both p 
and q are periods of z, but z does not have period gcd(p, q) = 1, that is, both letters a and 
b occur in z. These words are also known as extremal Fine and Wilf words [21]. They are 
palindromes and unique up to letter-to-letter isomorphism [IS, 21]. The latter fact implies 
that there are exactly two central words over {a, b} with given periods p and q; if one is z, 
then the other one is c(z), where c is the morphism a b, b a. 

Recall that the Fibonacci numbers are defined recursively as Fi = 1, F2 = 1, and 
Fn = Fn-1 + Fn-2 for 71 > 3, and that every two consecutive Fibonacci numbers are 
coprime. The two central words over {a, b} for periods F„_2 and F„_i can be obtained 
as follows. Let a, b e ^ be distinct letters. Then define /i = b, /2 = a, and 

fn = fn-1 f 71-2 {n > 3). 

We call the words /„ finite Fibonacci words over {a, b}. Note that we could also have 
defined /i = a and /2 = b. We will make this distinction when it matters. Let p„ denote 
the word for which /„ = PnXy with xy E {ab, ba} and n > 3. Thus 

/a = ab /4 = aba /s = abaab fe = abaabaab fj = abaababaabaab 

P3 = e P4 = a P5 = aba pe = abaaba pj = abaababaaba 

Note that = F„ and thus |p„| = Fn — 2. It can be shown that if n > 5, then p„ has 
periods Fn-2 and F„_i, but it does not have period gcd(F„_2, Fn-i) — 1. Thus each p„ 
is a central word over {a, b}. The other central word with the same periods and is c(p„), 
where c is the morphism a h-> b and b iH- a. 

The order < of ^ is extended to A* as follows: For u,v £ A* we have 

{M is a prefix of v, or 
u = xa.u' and v = xhv' with x, u' , v' G A* , a, b £ ^ and a < b. 

This is called the lexicographic ordering of A* with respect to <. 

A nonempty, primitive word w € yl+ is called a Lyndon word if it is the smallest word 
in its conjugacy class. In particular, letters are Lyndon words, but the empty word is not. 
For example, the Lyndon words w £ {0,1}+ with |w| < 4 are 

0, 1, 01, 001, Oil, 0001, 0011, 0111. 

Lemma 3 (Berstel and de Luca [2]). If z e ^* is a central word over {a, b} and a < b, 
then azb is a Lyndon word. 

According to Lemma 3, the words ap„b and ac(p„)b are Lyndon words; we call them 
Fibonacci Lyndon words of length Fn over {a, b}. The first few ones are 

ab aab aabab aabaabab aabaababaabab 

ab abb ababb ababbabb ababbababbabb 

The properties of Lyndon words given in the next lemma are well-known [17, 11]. 

Lemma 4. Lyndon words have following properties. 

(i) Lyndon words are unbordered. 

(ii) A word w is Lyndon if and only ifw<y for all nonempty suffixes y of w. 
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(iii) If u and v are Lyndon words and u < v, then uv is a Lyndon word. 

Lemma 5. Let w e be a Lyndon word. Suppose that za is a periodic extension of w 
for some z G and a, b e with a < b. Then zb is a Lyndon word. 

Proof. There exist an integer n > and words x,y G A* such that w = xay and z = 
w^x. We show that x'b is a Lyndon word; this suffices because then Lemma 4 implies that 
zb = w"a;b is a Lyndon word since w < x'b. 

Contrary to what we want to show, suppose that a;b is not a Lyndon word. Then 
Lemma 4 impUes that x'b has a nonempty suffix v with v < xh. Write v = v'h. Then 
v'a.y is a suffix of w, so that u'ay > w because w is a Lyndon word. Therefore b > a 
implies that v ~ v'h is not a prefix of w and thus not a prefix of x. Consequently v < xh 
implies that we can write v = tct' and xh = tdt" for some words t, t', t" £ A* and letters 
c, d G .4 with c < d. Since \t(i\ = \tc\ < \v\ < |a;b|, the word td is a prefix of x and thus 
a prefix of w. But then 

v'&y < v'hy = vy = tct'y < td < w, 
a contradiction. Thus xh is a Lyndon word, and the proof is complete. □ 

Lemma 6. Let w e A'^ be a Lyndon word with \w\ > 2. Let be the longest proper 
prefix of w that is also a Lyndon word. Then is a periodic extension of A^,. Furthermore, 
the word fi^u A^^ti' is a Lyndon word. 

Proof. Suppose that is not a periodic extension of A^ . Then there exist a word u and 
different letters a, b such that ua. is a prefix of A^, and A^wb is a prefix of for some 
integer fc > 1. If a < b, then A^wb is a Lyndon word by Lemma 5, contradicting the 
maximaUty of |Aiu|. If a > b, then {^'^)''^w < w because (Aj^)^^w begins with uh while 
w begins with ua. Thus w has a nonempty suffix that is smaller than w, contradicting 
Lemma 4 because w is Lyndon. Therefore is a periodic extension of A^, and conse- 
quently there exist u G A~^ and different letters a, b G .4 such that ua is a prefix of A^ and 
w — X^uh for some integer k > 1. Furthermore, a < b because otherwise w would not 
be a Lyndon word. Now = A~^ui = A^~^ub is a Lyndon word by Lemma 5. □ 

Due to its importance in upcoming considerations, let us restate Lemma 6: every non- 
letter Lyndon word w £ A'^ can be written as w = XwfJ-w, where Au, and /i^ are Lyndon 
words and is a periodic extension of A^. 

An infinite word is a sequence x = aia2a3 . . . a„ . . . where a„ e A. The set of infinite 
words over A is denoted by A^^. A tail of the infinite word x is another infinite word 
y G A^ such that x = xy for some x e A* . An infinite word x is purely periodic if 
X = uuu . . .u . . . for some finite word u € A'^\ we also denote this by x = u". The word 
X is ultimately periodic if it has a purely periodic tail. Finally, x is aperiodic if it is not 
ultimately periodic. A factor of x is a finite word that occurs somewhere in x. The set of 
all factors of x is denoted by F(x). The word x is called recurrent if each of its factors 
occurs at least twice (and thus infinitely many times) in x. If an infinite word is ultimately 
periodic and recurrent, then it is actually purely periodic. The shift orbit closure of x is the 
set of all infinite words y S A!^ such that F{y) C i^(x). We denote the Lyndon factors 
of X by ij(x). (Do not confuse this with the function £x defined in Section 4.) 

Let /„ be the finite Fibonacci words over {a, b} defined above with /i = b and /2 = a. 
If n > 2, then /„ is a prefix of fn+i- Thus there exists a unique infinite word f such that 
/„ is a prefix of f for every n > 2. The word f is called a Fibonacci infinite word over 
{a, b}. Note that there is another Fibonacci infinite word over {a, b}, which results from 
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defining /i = a and /2 = b. When we want to stress the definition of /i and /2 when 
constructing f, we denote f = lim,i^oc fn- The infinite Fibonacci words are an archetype 
of Sturmian words [18]. In particular, they are recurrent. 

Lemma 7. The Fibonacci word f = lini„_j.oo fn has the following properties. 

(1) The words p„ are palindromes [Q]. 

(2) The Lyndon factors of f are precisely the Lyndon conjugates of /„ [8, Lemma 7]. 

(3) If w is a conjugate of /„, then the reversal is a conjugate of /„ [22]. 

Lemma 8. Let f = lim„^oo fn be a Fibonacci word for which {/i, = {a, b}. Write 
fn = PnXy with xy G {ab, ba} and suppose that a < b. Then the Lyndon conjugate of /„ 
is the word ap„b for all n > 3. Furthermore, every Lyndon factor of f that is shorter than 
ap„b is either a prefix or a suffix of ap„b. 

Proof. First off, the word ap„b is a Lyndon word by Lemma 3. Thus the first claim is 
proved by showing that ap„b is a conjugate of /„. This is clear if /„ = p„ba, so assume 
that /„ = p„ab instead. Then bp„a is a conjugate of /„. Since the reversal of a conjugate 
of /„ is a conjugate of /„ and since p„ is a palindrome by Lemma 7, it follows that 
ap^b = ap„b is a conjugate of /„. 

Next we show that if fc < n, then the Lyndon conjugate of fk is a prefix or a suffix of 
ap„b. Since {/i, /2} = {a, b}, the claim is plainly true for k ~ 1,2. Also, a quick case 
analysis shows that the Lyndon conjugate of /s, which is ab, is either a prefix of suffix of 
ap„b. Thus we may suppose that fc > 4. Then k < n implies that fk is a prefix of p„. 
Furthermore, since Pn is a palindrome, the reversal f^ is a suffix of p„. Therefore, if 
fk = Pfcba, then ap^b is a prefix of ap„b; if fk = Pfcab, then apkh is a suffix of ap„b. 

□ 

3. Lyndon factors of Lyndon words 

Let w G »4+ be a Lyndon word. We denote the number of distinct Lyndon factors of w 
by C{w). A trivial but useful observation is that if |w| > 2, then 

C{w) > CiX^) + 1 and C{w) > Cifi^) + 1, 

where A„, and yu^ are the Lyndon words provided by Lemma 6. If |w| > 2, let pw denote 
the word such that w = apu,b for some letters a.,h £ A. 

Lemma 9. IfwE A'^ is a Fibonacci Lyndon word of length F„ with n > 3, then C{w) = 
n. 

Proof. This is an immediate corollary to Lemma 8: the word w is the Lyndon conjugate 
of fn (for some choice of /i, /2 G .4). Each of the Lyndon conjugates of fk with 1 < fc < 
n are either prefixes or suffixes of w, and these are the only Lyndon factors of w. Thus 

C{w) = n. □ 

Lemma 10. Let w E A'^ be a Lyndon word with \w\ = F„ for n > 3 and let a, b e ^ 
be the letters such that w = ap„,b. Then w is a Fibonacci Lyndon word over {a, b} if and 
only if a.pw has period and p^b has period F„_2, or vice versa. 

Proof. Suppose that w is a Fibonacci Lyndon word over {a, b}. The claim is readily 
verified for n < 4, so assume that n > 5. Since Pw has periods F„_2 and F„_i but 
does not have period gcd(F„_2, Fn-i) = 1, it follows that the word a.pu, has period either 
Fn-2 or Fn-1. Indeed, if pu, did not have either period, then hpw would have both periods, 
contradicting Lemma 2 because |p^| ~ Fn — 2. An analogous ai-gument shows that p^h 
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must have period either Fn-2 or F„_i. Finally, apw andpu,b cannot have the same period 
Fn-2 or Fn-1 because otherwise w would have the same period, which it does not since it 
is unbordered by Lemma 4. 

Conversely, suppose that a.p^ has period F„_i and p^b has period Fn-2, or vice versa. 
Again if n < 4, the claim is trivial, so assume n > 5. Then since Fn-i < \a.pw\ ~ \Pw^\, 
it follows that both a and b occur in and thus gcd(F,i_i, F,i_2) = 1 is not a period 
of pw Since pw also satisfies \pu,\ — Fn — 2 and it has periods Fn-2 and Fn-i, we have 
shown that p^ is a central word and thus w is a Fibonacci Lyndon word over {a, b}. □ 

Lemma 11. Let w e A'^ be a Lyndon word with \w\ > Fn for some n > 3. We have 
C{w) > n. Furthermore, the identity C{w) = n implies that w is a Fibonacci Lyndon 
word of length F„ . 

Proof. Using the fact that letters are Lyndon words and that w is a product of two shorter 
Lyndon words, the reader readily verifies the claim for n < 4. Hence we assume that 
n > 5. Since w is a Lyndon word, we have w = X^l^w ~ apt^b for some letters a, b e ^ 
with a < b. We split the proof in several cases depending on the length of A^u. 

Case (i). Suppose that |Au,| > F„_i. Since n — 1 > 4, we may apply the induction 
assumption to X w and obtain 

Ciw) > £(A„) + 1 > n + 1 > n. 

Case (ii). Suppose that jA^,! = Fn-i- Then > Fn-2, and we have three subcases: 
Case (ii-a). If /iu^ is not a factor of X^, then the induction assumption implies 

C{w) > £(A^„) + M^} > (71 - 1) + 2 > n. 

Case (ii-b). Suppose that is a factor but not a suffix of A^,. Then by denoting the 
longest Lyndon prefix of A„, by Aa„, we have |Aa^| > \l^w\ because A,^ is a periodic 
extension of Aa„ by Lemma 6 and /i^ is unbordered by Lemma 4. Therefore |Aa,J > 
Fn-2- Since n — 2 > 3, we may apply the induction assumption to Aa„,, obtaining 

(1) C{w) > C{Xu,) + 1 > C{XxJ + 2 > (n - 2) + 2 = n, 

where the last inequality is equality only if = |Aa„ | = Fn-2- This would imply that 
Aa„ and are conjugates, which would further imply that Aa„ = fJ^w because both are 
Lyndon words. But then is both a prefix and a suffix of w, contradicting the fact that w 
is unbordered. Hence the third inequality in (1) is strict. 

Case (ii-c). Suppose that fi^u is a suffix of A^,. Then it is actually a suffix of px„i> 
because /it^ cannot equal X^ - Since | Au, | — Fn-i, the induction assumption gives 

C{w) > C{Xn,) + 1 > (n - 1) + 1 = n. 

If C{w) > n, we are done, so assume C{w) = n. Then £(A„,) = n — 1. We will show that 
apu, has period Fn-i and p^b period Fn-2, which means that w is a Fibonacci Lyndon 
word by Lemma 10. 

First, the word ap„, has period |A^| = Fn-i because ap^ = is a periodic extension 
of Au, by Lemma 6. Second, since |Atu| = Fn-i and £(A„,) = n — 1, the induction 
assumption implies that Au, is a Fibonacci Lyndon word and therefore Lemma 6 implies 
that px^h has period either Fn-3 or Fn-2- The period cannot be Fns, however, because 
the unbordered word /i^, is a suffix of pA„t> and | > Fn-2 > Fns- Thus pA„t> has 
period Fn-2, and furthermore j/Xm | = Fn-2- Now the fact that /i^ is a suffix of px^h with 
IMujI = Fn-2 and thatpA„b has period Fn-2 imply thatp^jb = px^hfiui has period Fn-2 
by Lemma 1 . 
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Case (Hi). Suppose that \w\/2 < \Xw\ < Fn-i- Then = — |Au,| gives 
Fn-2 < \lJ-w\ < |Au)|, SO Au) is not a factor of Furthermore, since n — 2 > 3, the 
induction assumption gives 

C{w) > + #{u., A^} > (71 - 2) + 2 = n. 

Case (iv). Suppose that |A^| = 1^1/2. Then |/i.uj| = |Au,| and thus /i^, is not a factor 
of Aiu because otherwise /.i^ = A^, and w would be bordered. Noting that |Aiu| = \w\/2 > 
Fn-2 because n > 4, we therefore have by induction 

C{w) > £(A^„) + #{w, n^}>{n-2) + 2 = n. 

Case (v). Suppose that F,i_2 < \^w\ < |w|/2. Then |A.uj| < and so is not a 
factor of A^. Since we also have n — 2 > 3, the induction assumption gives 

Ciw) > £(A^) + #{w,M«;} > (n-2) + 2 = n. 

Case (vi). Suppose that |Au,| = F„_2. Then > F„_i, so the induction assumption 
gives 

(2) C{w) > Cifi^o) + 1 > {n - I) + I = n. 

If C{w) > 71, we are done, so assume C{w) = 7i. Our goal is to show that ap^ has 
period F„_2 and pu,b has period Fn-i, which means that w is a Fibonacci Lyndon word 
by Lemma 10. The first objective is easy because = ap^u is a periodic extension of A^ 
by Lemma 6, and thus ap^j has period \X.w\ = Fn-2- 

For the second objective, we take care of a special case first. If ij = 5, then \Xu,\ = 
2, so that Atu = ab. Then w = ababb because is a periodic extension of A^,, and 
conseauently p^h = 1011 has period F4, as claimed. We may thus assume that 71 > 6. 

Now, note that since C{w) = 71 in Eq. (2), we have C{^w) = 71 — 1. Since also 
Im^I ^ Fn-i, the induction assumption says that actually ~ Fn-i and that jiw is 
a Fibonacci Lyndon word. Thus Lemma 6 implies that one of ap^^ and p^^h has period 
F„_3 and the other one period Fn-2- Since ~ X^a-p^^ is a periodic extension of A(u 
and \Xw\ < we see that A^, is a prefix of ap^^^, . Therefore ap^^^ cannot have period 
Fn-3 because i^„_3 < |Ai„| and A^ is unbordered. Hence ap^^^ has period Fn-2 and 
p^^,b has period F„_3. Since 71 > 6, we have |pA„ba| = Fn-2 < -Fji-i — 2 = b^i„|, 
and consequently since p^; = PA„t>ap^_^ and pu, has period Fn-2, it follows that px^ha. 
is a prefix of p^^, . Since p^^, has periods Fn-3 and Fn-2, Lemma 1 impUes that p^ has 
periods Fn-2 and Fn-2 + Fns = Fn-i- Now, as we have reasoned before, p„b must 
have period either Fn-2 or Fn-i for otherwise a would have both periods contradicting 
Lemma 2. Since apw has period Fn-2 and ti; is unbordered, we conclude that p^h must 
have period Fn-i- 

Case (vii). Suppose that |Au>| < Fn-2- Then \fiw\ > Fn-i, so that the induction 
assumption implies 

C{w) > C{^iyj) + 1 > (71 - 1) + 1 = n. 

□ 

Theorem 1. Let w e be a Lyndon word with \w\ > F„ for some 71 > 3. Then 
C{w) > n with equality if and only if is a Fibonacci Lyndon word of length F^. 

Proof. The claim is obtained by combining Lemmas 9 and \ \. □ 

Corollary 1. If G is a Lyndon word, then C{w) > log^l w|, where (j) ~ {1 + \/5) /2 
is the golden ratio. 
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Proof. The claim is trivial if |w| = 1, so suppose \w\ > 2, and let 7i > 3 be the unique 
integer for which F„ < \w\ < Fn+i- It is well-known [16] that Fn+i < 0", and therefore 
log^luil < n. On the other hand. Theorem 1 says that > n. The claim follows from 
the last two inequalities. □ 

Remark 1. A noteworthy feature of Theorem 1 is that the optimal words, the Fibonacci 
Lyndon words, are made of just two different letters. A priori it may seem "obvious" that 
this should always be the case for a Lyndon word of a given length having the smallest 
possible number of Lyndon factors, but this, in fact, is not always true. For example, each 
Lyndon word of length 6 has at least 7 Lyndon factors. In this case the optimal words are, 
up to renaming the letters, 

000001 000101 001101 010111 010102 010202 021022 011111, 

three of which are made of three different letters. However, see Conjecture 1. 

Conjecture 1. For n > 1, denote 

l{n) = minj C{w) \ w is a Lyndon word with \w\ — n^. 

If w is a Lyndon word with ^ 6 and C{w) = then w is Sturmian Lyndon word, 

i.e., we have w G {a, b}"*", w — apu,b, and is a central word. Furthermore, we have 

fe 

in = min| Qi I there exists an integer q such that n/q = [oq; oi, . . . , Ofc] |. 

1=0 

The expression [ao;ai, . . . , Uk] above denotes the continued fraction expansion of n/q. 

4. Lyndon factors of recurrent words 

Our main goal in this section is to prove Theorem 4 stating that the number of Lyndon 
factors in recurrent infinite words is governed by the Fibonacci word. We begin with results 
that are interesting in their own right. 

Lemma 12 (Siromoney, Mathew, Dare, and Subramanian P"]). Each infinite word x € 
admits a unique factorization of the form either 

X = Wi or X = W1W2 ■ ■ ■ WnX.' , 

i>l 

where each Wi G is a finite Lyndon word with Wi > Wi+i and x' e A^^ begins with 
arbitrarily long Lyndon words. 

Lemma 13. Let x S A^ be an infinite word. Then at least one of the following holds: 

(i) X is ultimately periodic; 

(ii) X has a tail that begins with arbitrarily long Lyndon words; 

(iii) X = Y[i>i where each Wi G A'^ is a Lyndon word and, for every fc > 1, there 
exists an index such that \wi\ > k for all i > i^. 

Proof. Let us suppose that x is aperiodic and that it does not have a tail beginning with 
arbitrarily long Lyndon words; we will show that then x satisfies property (iii). Let A; > 1. 
Lemma 12 says that x admits a factorization x = nj>i "^i which the Wi E A'^ are 
Lyndon words and Wi > Wi+i. Since x is aperiodic, the sequence of words Wi is not 
ultimately constant. Therefore we can define 

ik = maxjj : wj = mm{wi : < fc}}. 
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In other words, ik is the largest index such that Wi^. is the least word among the words Wi of 
length \wi\ < k (there are only finitely many of them because A is finite). Then \wi\ > k 
whenever i > ik- Indeed, if i > ik, then the inequality Wi,, > Wi and the maximality of ik 
imply Wij. > Wi. Because of the minimality of m;^, we thus have \wi\ > k. □ 

Corollary 2. If an infinite word x e has only finitely many distinct Lyndon factors, 
then it is ultimately periodic. 

Proof. If X satisfies property (ii) or (iii) in Lemma 13, then it clearly has infinitely many 
Lyndon factors. Thus x satisfies property (i) and is ultimately periodic. □ 

Remark 2. The sleek proof of Corollary 2 was suggested by Tero Harju. The author's 
original proof was more intricate. 

Recall that the set of factors of an infinite word x is denoted by -F(x) and the set of its 
Lyndon factors is denoted by L{x.). 

Theorem 2. If x and y are recurrent infinite words and £(x) = L{y), then F(x) = ^"(y)- 

Proof. Suppose first that x is ultimately periodic. Then, in fact, it is purely periodic be- 
cause it is recurrent. Writing x = u'^, where li is a primitive word, we see that there is only 
one Lyndon factor of length |u| of more, and that is of course the Lyndon conjugate of u. 
Thus y has only finitely many Lyndon factors, so it is ultimately periodic by Corollary 2, 
and hence purely periodic because it is recurrent. Write y = with v primitive. Since 
the Lyndon conjugate of u is a factor of y and the Lyndon conjugate of f is a factor of x, 
it follows that v and u are conjugates, and thus F{x.) = F{y). 

Suppose then that x is aperiodic. We show that every u e F{x.) is a factor of a Lyndon 
factor of X. Since y must be aperiodic as well, the analogous property clearly holds for y, 
implying that F(x) = ^(y)- If x satisfies property (ii) of Lemma 13, then some tail x' 
of X begins with arbitrarily long Lyndon words. Since x is recurrent, it follows that u is a 
factor of a Lyndon prefix of x'. Thus suppose that x satisfies property (iii) of Lemma 13. 
Since x is recurrent, there exists a word v such that uvu G F{x). Let k ~ \uvu\ and 
let ik be the index provided by Lemma 13. Since x is recurrent, the word uvu occurs in 
Wi^Wif,+iWi^+2 ■ ■ ■ ■ Since I > fcforeachj > 1, it follows that u necessarily occurs 

in some Wij^+j. □ 

For an infinite word x G .4^, we define a mapping £x : N ^ N such that Cx{n) is 
the number of Lyndon factors of x of length at most n. Notice that the mapping is 
increasing, but not necessarily strictly increasing, as can be seen from Lemma 14. Notice 
also that Cx determines the number of Lyndon factors of any length k > I. Indeed, it it 
given by the expression Cx{k) — Cx{k — 1) for k > 2. 

Lemma 14. Let f G A^ be a Fibonacci infinite word. Then for all n > 1, we have 
^f{n) = k, where fc > 2 is the unique integer such that Fk < n < Fk+i. 

Proof. Lemma 8 says that the Lyndon factors in f are precisely the Lyndon conjugates of 
the finite Fibonacci words fk. Therefore if i^^ < n < Fk+i, the Lyndon factors of length 
at most n are the Lyndon conjugates of /i, /2, . . . , fk, so that Cf{n) = k. □ 

Theorem 3. If x G A^ is aperiodic, then > C{. 

Proof. Since £x is increasing. Lemma 14 implies that it suffices to show that C^^Fk) > k 
for all k > 2. This is clear for fc = 2,3 because x is aperiodic. Thus assume k > 4, 
and let w be a shortest Lyndon factor of x of length > Fk; Corollary 2 ensures that such 
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a word w exists because x is aperiodic. Theorem 1 implies that > k. Since w is as 

short as possible, all of its proper Lyndon factors are of length at most Fk- Therefore 

C^{Fk) > C{w) - 1 > fc. 

□ 

Remark 3. A classic result by Ehrenfeucht and Silberger [ 1 2] states that if an infinite word 
has only finitely many unbordered factors, then it is ultimately periodic. Since Lyndon 
words are unbordered. Theorem 3 is a quantitative formulation of this with an exact lower 
bound for the necessary number of unbordered factors. 

Remark 4. Looking at Theorem 3, one might be tempted to postulate that if x is aperiodic, 
then for all n > 1, the number of lenght-n Lyndon factors of x must be at least as big as 
the number of lenght-71 Lyndon factors of a Fibonacci infinite word, but this is not true. 
For example, let f is the Fibonacci infinite word over {a, b} with fi=h and /2 = a, and 
let X = .g(f), where g is the morphism a 1-^ aab, a 1-^ aaab. Then x does not have any 
Lyndon factors of length 5, while f has the factor aabab. 

Theorem 4. Let x e .4^ be recurrent. Then x is aperiodic if and only if £x > with 
equality if and only if x is in the shift orbit closure of a Fibonacci infinite word f . 

Proof. Let us first prove the first equivalence. Suppose x is ultimately periodic. Then it is 
purely periodic because it is recurrent. Thus is ultimately constant, so Cxin) < Cf{n) 
for all sufficiently large n. Conversely, if Cx{n) < Cf{n) for some n > 1, then Theorem 3 
implies that x is ultimately periodic. 

Let us next prove the second equivalence. Obviously the condition that x be in the shift 
orbit closure of a Fibonacci word f is sufficient for the identity = Cf to hold. Let us 
prove the converse, and suppose that £x = Cf. Then in particular, /3x(l) = -Cf (1) = 2, so 
that X consist of two distinct letters, say a and b. Since £x(3) = 1, exactly one of aab and 
abb occurs in x. If aab is in L(x), let f = lim„^tx) fn be the Fibonacci word with /i = b 
and /2 = a, so that aab G L{f). If abb is in L(x), let f = lim„_j.oo fn be the Fibonacci 
word with /i = a and /2 = b, so that abb e i(f). Then a Lyndon word of length at most 
3 is a factor of x if and only if it is a factor of f . We will show next that i(x) = L{i)\ then 
Theorem 2 implies that i^(x) = ^'(f ) because both x and f are recurrent. 

If i(x) 7^ L{f), then the identity £x = 'Cf implies that there exist an integer and 
Lyndon words w, z with |w| ~ \z\ = such that w is a factor of x and z is a factor 
of f . Let us assume that k is as small as possible. Since the Lyndon factors of length 
at most 3 in X and f coincide, we have \w\ > 3, and thus fc > 5. Recall that w can be 
written as ui — X-wfJ-w where X^^ and /i^, are Lyndon words by Lemma 6. Furthermore, 
each of |Au,| and is a Fibonacci number, and therefore |A„, | + |/iuj| = |w| = Fk 
yields {|Auj|, = {Fk^i, Fk-2}- The same reasoning shows that z = A^yU^ and 

{\Xz\, |Mz|} = {Fk-i, Fk-2}- But since fc > 5, we have Fk-2 > 2, which means that 
{Xw, fJ-w} = {Xz, l^z} because the set of Lyndon factors of x of length less than Fk co- 
incides with the set of Lyndon factors of f of length less than Fk and there is precisely 
one Lyndon factor of length Fk-2 and precisely one Lyndon factor of length Fk-i. Con- 
sequently, z is a product of Xw and ^Uu,. However, if 2 = Xw^^w, then z = w, and if 
z — fiwXw, then z is not a Lyndon word, both contradictions. □ 

Remark 5. Theorem 4 shows that the shift orbit closure of a Fibonacci infinite word f 
is characterized by the mapping £f , up to renaming letters. But in general, the mapping 
mapping £x does not determine a recurrent word x. For example, the identity £x ~ Cy 
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holds for the two periodic words X = (000001)" andy = (000101)'^, but clearly F(x) ^ 
F{y) and F(x) ^ F{c{y)), where c is the morphism 1, 1 M' 0. 
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